AI for Medical Prognosis
![Source]
The contents and notes are from Coursera course "AI for Medical Prognosis".
What is Medical Prognosis
- prognosis = Predicting the risk of a future event
- risk of illness
- survival probability with illness
- examples
- use clinical history, physical examinations, labs & imaging, etc. to predict the risk score/survival probability
Build & Evaluate Risk Models
Risk model types
- linear risk model
- risk score = a linear sum of the coefficient x feature
- tree-based model
- be cautious about missing data
- identifying missing data types
- missing completely at random = missingness not dependent on anything → no bias
- missing at random = missingness dependent only on the available information (for somewhat feature criteria)
- missing not at random = missingness dependent on unavailable information
- use data imputation to handle missing data (Data preprocessing#^594d05)
- identifying missing data types
- be cautious about missing data
Risk model evaluation
-
Since the risk score can be any value, it needs to be compared in pairs. In a group of predicted patients,
- concordant pair: if patient A has a higher risk outcome than patient B, and A’s risk score is higher than B’s risk score
- risk ties: patient's risk scores are the same
- permissible pair: patients’ outcomes are not the same
-
Thus:
- +1 for a permissible pair that is concordant
- +0.5 for a permissible pair for risk tie
-
C-index (max 1, 0.5 for any random constant risk)
Build & Evaluate Survival Model
What does survival model tell
- What is the probability of survival past any time t?
- With survival functions: S(t) = Pr(T>t)
- always decreasing from 1 to 0: longer the t, harder to survive
Survival data structure
- in survival data, the labels are amounts of time to event
- censoring observations: no observations of events happening in the specified time period:
- end-of-study censoring (no event)
- loss-to-follow-up censoring (patients withdraw)
- right censoring = the time to events is only known to exceed a certain value (e.g. 12 months → 12m +)
Estimate survival
Let
\hat{S}(t) = \frac{|X_t|}{|M_t|} $$
- Kaplan Meier estimate: the probability of survival past t months with censored observations $$ S(t) = \prod 1- Pr(T=i | T >= i ) = \prod_{t_i \leq t} (1 - \frac{d_i}{n_i}) $$
are the events observed in the dataset is the number of deaths at time is the number of people who we know have survived up to time .
Survival model types
- Hazard functions
-
Hazard: what’s a patient’s immediate risk of death if they make it to time t (risk of death if aged t)
-
cumulative hazard $$ \Lambda (t) = \int _0
{ #t}
\lambda(u) du $$ -
individual hazard $$ \lambda_{individual} (t) = \lambda_{0} (t) exp (\sum_i {B_i X_i}) $$ where
is a baseline hazard, B_i is coefficient of factor X_i -
relation between survival and hazard: $$ S(t) = exp(- \int _0
{ #t}
\lambda(u) du) $$
-
- Survival trees
- Nelson Aalen Estimator: estimate the cumulative hazard of the population
{ #t}
\frac{d_i}{n_i}
$$
- Mortality score = a single score value of cumulative hazard at the event times
Survival model evaluation
- as a variation of evaluating prognostic models, with a slightly different definition of a concordant pair, a risk tie, and a permissible pair with survival data
- here the risk outcome is the time to the event
- concordant pairs: the patients with worse risk outcome (earlier event) have higher risk scores
- permissible pairs: patients’ event times are not the same
- Harrell’s C-Index: same as the C-index formula